Mining broad hidden query aspects from user search sessions

ABSTRACT

An optimization-based framework is utilized to extract broad query aspects from query reformulations performed by users in historical user session logs. Objective functions are optimized to yield query aspects. At run-time, the best broad but unspecified query aspects relevant to any user query are presented along with the results of the run time query.

RELATED APPLICATIONS

This application is a divisional application and claims priority fromapplication Ser. No. 12/332,187, Attorney Docket No. YAH1P188, entitled“Mining Broad Hidden Query Aspects from User Search Sessions,” by Puneraet al, filed on Dec. 10, 2008, which is incorporated herein by referencein its entirety for all purposes,

BACKGROUND OF THE INVENTION

This invention relates generally to search engines and queries.

The World Wide Web has grown dramatically over the last few years andsearch engines have become the primary mode of discovering and accessingweb content for a large fraction of the users. However, even though theusers employ search engines for critical information access tasks, theyare remarkably laconic in describing their information needs. Thisbehavior might be an outcrop of many factors. Users often use searchengines for performing research on unfamiliar topics. Hence, they mightskip important details in search queries because they aren't aware ofthem or haven't built up the correct vocabulary yet. In some other casesusers neglect to add certain terms to queries because they believe theterms are obvious from the context or they aren't aware of otherambiguous senses of their incomplete queries. Search engines themselvesmight reinforce this behavior by not properly taking into account theextra information when the users do provide long descriptive queries.

SUMMARY OF THE INVENTION

A further understanding of the nature and advantages of the presentinvention may be realized by reference to the remaining portions of thespecification and the drawings.

Embodiments of the invention find query aspects, that although notspecified by the user, may be what the user had in mind and will suggestthe query aspects and in some instances run the query with theunspecified aspects. The aspects are tailored to be sufficiently broadto apply to many different queries while being specific enough toaccurately describe the hidden intent of the user.

Embodiments employ an optimization-based framework to extract broadquery aspects from query reformulations performed by users in historicaluser session logs. Objective functions are optimized to yield queryaspects.

One aspect relates to a computer-implemented method for providing searchresults. The method comprises analyzing search logs for queryreformulations, extracting query reformulations from the analysis of thesearch logs, clustering the extracted query reformulations intoclusters, selecting a group of the clustered extracted queryreformulations, selecting clustered query reformulations from among thegroup of clustered extracted query reformulations so as to maximize asimilarity measure, and presenting the clustered extracted queryreformulations along with the results of a search.

Another aspect relates to a computerized searching system. The system isconfigured to analyze search logs for (i) a first query by a usercomprising a first search term, followed by (ii) a second querycomprising the first search term and a qualifier not initially specifiedin the first query. The system is further configured to determine kaspects of the qualifier, receive an original query at run time, andpresent to the user in response to the original query at least one ofthe k aspects along with results of the original query.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flow chart of method of offline steps embodiments mayutilize.

FIG. 2 is a flow chart of online steps embodiments may utilize.

FIGS. 3A and 3B are graphs illustrating the performance of differentembodiments as compared to a baseline.

FIG. 4 is a simplified diagram of a computing environment in whichembodiments of the invention may be implemented.

A further understanding of the nature and advantages of the presentinvention may be realized by reference to the remaining portions of thespecification and the drawings.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

Reference will now be made in detail to specific embodiments of theinvention including the best modes contemplated by the inventors forcarrying out the invention. Examples of these specific embodiments areillustrated in the accompanying drawings. While the invention isdescribed in conjunction with these specific embodiments, it will beunderstood that it is not intended to limit the invention to thedescribed embodiments. On the contrary, it is intended to coveralternatives, modifications, and equivalents as may be included withinthe spirit and scope of the invention as defined by the appended claims.In the following description, specific details are set forth in order toprovide a thorough understanding of the present invention. The presentinvention may be practiced without some or all of these specificdetails. In addition, well known features may not have been described indetail to avoid unnecessarily obscuring the invention.

Query aspects may include query qualifiers (i.e., terms added to queriesduring reformulations). These reformulations are monitored and logged ona regular basis, at a time before a particular search of interest, incertain embodiments of the invention. Embodiments find such aspects andupon receiving any original query at run-time, the query qualifiers canbe covered by some number of aspects, which are then presented to theuser along with results of the original query. Such actions taken beforea current, or new search is undertaken are referred to as “offline,”whereas actions taking place to return search results for a new searchmay be referred to as “online” or as “run time.”

FIG. 1 is a flow chart illustrating offline activities. While such stepsgenerally occur offline, or prior to run time of a current search, itshould be understood that in some embodiments one or more of the stepsmay occur at run time.

In step 102, the system searches logs for query reformulations. For allor a subset of the query reformulations found, the system extracts andstores the query reformulation and optionally other information relatingto the reformulations in step 106. In one embodiment, only a subset ofquery reformulations that exceed a threshold are utilized. For example,a threshold of query reformulations that result in a user click may beutilized. The threshold will of course vary depending on user trafficand the particular search engine and related databases, but in oneexample, only query reformulations resulting in more than about four tofive hundred clicks and associated views of a page/site per month wouldbe utilized.

Next, in step 110 the system clusters the extracted reformulations.Modified star clustering is one of many methods that may be employed byembodiments of the invention in order to pick the set A of N queryaspects. The aim is to build the set A such that, with the best kaspects being picked for each query, and the total similarity betweenthe query qualifiers and the corresponding k aspects per query aremaximized, as seen in the table below.

Algorithm 2 Modified Star Clustering  1: input: set of qualifiers  

 = ∪_(q∈Q) 

 (q), qualifier frequencies L (υ)∀υ ∈  

 , threshold σ, N  2: Create a graph  

 = ( 

 , ε) where  

  is the set of qualifiers, and   ε = {(i,j)|cosSim(i,j) > σ}  3: n ← 0,Left ←  

 , A ← {φ}  4: while n < N and Left ≠ {φ} do  5:  hub ←argmax_(υ∈Left)L(υ)  6:  spokes ← {i|(hub, i) ∈ ε}  7:  star ← {hub} ∪spokes  8:  A ← A ∪ {star}  9:  Left ← Left\star 10:  n ← n + 1 11: endwhile 12: output: set A of at most N query aspects

Further details on the modified star clustering process can be found ina paper by Javed A. Aslam, Ekaterina Pelekov, and Daniela Rus, entitled“The Star Clustering Algorithm for Static and Dynamic InformationOrganization,” published in the Journal of Graph Algorithms andApplications, 8(1), 2004, hereby incorporated by reference in theentirety. Any other clustering technique may be employed, although themodified star technique is preferred. One advantage of the modified starmethod is that it does not require specification of how many clustersare desired. Other examples of clustering techniques that may beemployed include, for example, original star, K-means, expectationmaximization (“EM”) or Metis.

In step 114, the system makes an inter cluster (local) move to maximizethe number of user queries covered with the facet clusters that havebeen created. An embodiment of the local search technique associatedwith the inter cluster move is described in the table below.

Algorithm 4 Local-Search 1: input: set of queries Q, set of qualifiers  

  = ∪_(q∈Q) 

 (q), max- imum number of query aspects N 2: Initialize the set of queryaspects A to the output of Algorithm 2 with at most N aspects 3: Computethe best k query aspects from A for each original query using Algorithm1 4: repeat 5:  reselectK = “no” 6:  move ← Best-Local-Move(Q, A,reselectK) 7:  if move = φ then 8:   reselectK = “yes” 9:   move ←Best-Local-Move(Q, A, reselectK) 10:  end if 11:  if move ≠ φ then 12:  Updqate A according to move 13:   if reselectK = “yes” then 14:   Recompute the best k query aspects from the new A    for eachoriginal query using Algorithm 1 15:   end if 16:  end if 17: until move= φ 18: output: set A of at most N query aspects

Then in step 118 the system picks a subset of clusters from step 114.The number of clusters chosen and methodology of choosing the clustersmay vary. In one embodiment the top 50-150 cluster are chosen,preferably the top 100.

FIG. 2 is a flow chart of online steps embodiments may utilize. In step202, the system will receive a search query from a user. Then in step206, the system will pick k aspects. In one embodiment this is doneaccording to the pick-k process described below. Of course it should beunderstood that this may be done in numerous other ways.

The Pick-k Process

Given a set A of query aspects, and a query q, the method picks kaspects a1, . . . , akεA so as to maximize the similarity measureF(l(q), ∪_(i=1) ^(k)α_(k), Embodiments maximize any similarity functionof the form

$\begin{matrix}{{{S\left( {X,Y_{1},\ldots \mspace{14mu},Y_{k}} \right)} = \frac{{f_{0}(X)} + {\Sigma_{i}{f\left( {X,Y_{i}} \right)}}}{{g_{0}(X)} + {\Sigma_{i}{g\left( {X,Y_{i}} \right)}}}},} & (5)\end{matrix}$

where X and Yi are vectors in some finite-dimensional space, thefunctions g₀( ) and g( ) are non-negative, X is fixed from the start,and the Yi vectors must be picked from a set Y.

Algorithm 1 Pick-k 1: input: k, vector X, set of vectors γ 2: α ←f₀(X)/k, β ← g₀(X)/k, Y ← {φ}, n ← k 3: while n > 0 do 4:  $\left. M\leftarrow\left\{ {{\arg \; \max},\frac{{f\left( {X,Y_{i}} \right)} + {\alpha/n}}{{g\left( {X,Y_{i}} \right)} + {\beta/n}}} \right\} \right.$5:  If |M| > n, then keep any n elements in M and throw away the rest 6: Y ← Y ∪ (∪_(mεM)Y_(m)) 7:  α ← α + Σ_(mεM)f(X,Y_(m)) 8:  β ← β +Σ_(mεM)g(X,Y_(m)) 9:  n ← n − |M| 10: end while 11: output: pickedelements Y ⊂ γ

Then in step 210 the system will provide the k query aspects along sidethe search results. In other words, it will cause a client computer todisplay the query aspects along side the query results.

FIGS. 3A and 3B are graphs illustrating the performance of differentembodiments (of selecting and presenting broad hidden query aspects) ascompared to a baseline. FIG. 3A illustrates a performance comparison ofthe embodiments based on one broad aspect, that is k=1, whereas FIG. 3Billustrates a performance comparison of the embodiments based on threebroad aspects, that is k=3. Bar 300 represents the baseline. Bar 302represents an embodiment that employs original star clustering(ORGSTAR), without the local (inter cluster) move of step 114 describedabove. Bar 304 represents an embodiment that employs modified starclustering in step 110 (MODSTAR) given in Algorithm 2, but with withoutthe local (inter cluster) move of step 114 described above and given inAlgorithm 4 (LOCSEARCH). Bar 306 represents an embodiment that employsmodified star clustering in step 110, together with the pick K algorithmof step 206. Bar 308 represents an embodiment that employs modified starclustering in step 110, together with the local (inter cluster) move ofstep 114, and the pick K algorithm of step 206.

Searches in accordance with embodiments of the invention in somecentralized manner. This is represented in FIG. 4 by server 408 and datastore 410 which, as will be understood, may correspond to multipledistributed devices and data stores. The invention may also be practicedin a wide variety of network environments including, for example,TCP/IP-based networks, telecommunications networks, wireless networks,public networks, private networks, various combinations of these, etc.Such networks, as well as the potentially distributed nature of someimplementations, are represented by network 412.

In addition, the computer program instructions with which embodiments ofthe invention are implemented may be stored in any type of tangiblecomputer-readable media, and may be executed according to a variety ofcomputing models including a client/server model, a peer-to-peer model,on a stand-alone computing device, or according to a distributedcomputing model in which various of the functionalities described hereinmay be effected or employed at different locations.

The above described embodiments have several advantages and are distinctfrom prior methods. For example, the extraction of broad aspects fromquery logs, and their use in query refinement, have several advantagesover prior query suggestion methods. The first advantage has to do withthe discovery and use of broad aspects and query suggestions. The broadnature of the query aspects ensures that enough data is available toreliably construct these aspects and predict when they apply to userqueries. This is in contrast to query suggestions that are oftenapplicable to specific queries and hence learned from significantlylesser amount of data. The availability of more data for analysis alsoimplies that the technique avoids presenting the user with redundantquery refinement options, as is often the case with query suggestions.Since by definition there are fewer broad aspects of queries than querysuggestions, they can be better maintained without the need for manualintervention.

The second and more principal advantage is more subtle, and concerns theway users navigate the search results page. It has been shown in usereye-tracking studies as well as by modeling user clicking behavior thatusers scan search result pages extremely quickly and don't make acomplete determination of the relevance of results before clicking.Users therefore acclimate to repetitive features in the search resultspage and use them to make clicking decisions. For example, the boldedwords in the title of the result indicates to users that the titlematched the query very closely, while the indented search resultindicates to the user that this search result is somehow related to theprevious one. When users are exposed to query suggestions, which bydefinition are specialized to the current query, they have to carefullyread the suggested queries in order to decide whether to click on them.Since the users scan result pages very fast, they often skip thesuggested queries as irrelevant content. By using a limited number ofbroad aspects of queries as options for refinement the user will thenneed less attention to interpret the aspects, for example “Reviews andRatings,” when they are presented to them.

While the invention has been particularly shown and described withreference to specific embodiments thereof, it will be understood bythose skilled in the art that changes in the form and details of thedisclosed embodiments may be made without departing from the spirit orscope of the invention

In addition, although various advantages, aspects, and objects of thepresent invention have been discussed herein with reference to variousembodiments, it will be understood that the scope of the inventionshould not be limited by reference to such advantages, aspects, andobjects. Rather, the scope of the invention should be determined withreference to the appended claims.

What is claimed is:
 1. A computer-implemented method for providingsearch results, comprising: analyzing search logs for (i) a first querycomprising a first search term, followed by (ii) a second querycomprising the first search term and a qualifier not initially specifiedin the first query; determining k aspects of the qualifier; receiving anoriginal query at run time; and providing in response to the originalquery at least one of the k aspects along with results of the originalquery.
 2. The method of claim 1, wherein determining k aspects of thequalifier comprises clustering the first search term and qualifier. 3.The method of claim 2, wherein determining k aspects of the qualifierfurther comprises selecting from clusters resulting from the clustering.4. The method of claim 2, wherein determining k aspects of the qualifierfurther comprises an inter cluster move of an aspect from a firstcluster to a second cluster.
 5. The method of claim 1, whereindetermining k aspects of the qualifier comprises applying modified starclustering.
 6. The method of claim 1, wherein determining k aspects ofthe qualifier comprises applying k means clustering.
 7. A computerizedsearching system configured to: analyze search logs for (i) a firstquery comprising a first search term, followed by (ii) a second querycomprising the first search term and a qualifier not initially specifiedin the first query; determine k aspects of the qualifier; receive anoriginal query at run time; and providing in response to the originalquery at least one of the k aspects along with results of the originalquery.
 8. The system of claim 7, wherein determining k aspects of thequalifier comprises clustering the first search term and qualifier. 9.The system of claim 8, wherein determining k aspects of the qualifierfurther comprises selecting from clusters resulting from the clustering.10. The system of claim 8, wherein determining k aspects of thequalifier further comprises an inter cluster move of an aspect from afirst cluster to a second cluster.
 11. The system of claim 7, whereindetermining k aspects of the qualifier comprises applying modified starclustering.
 12. The system of claim 7, wherein determining k aspects ofthe qualifier comprises applying k means clustering.
 13. The system ofclaim 7, wherein the original query comprises the first search term. 14.At least one computer readable storage medium having computer programinstructions stored thereon that are arranged to perform the followingoperations: analyzing search logs for (i) a first query comprising afirst search term, followed by (ii) a second query comprising the firstsearch term and a qualifier not initially specified in the first query;determining k aspects of the qualifier; receiving an original query atrun time; and providing in response to the original query at least oneof the k aspects along with results of the original query.
 15. Thecomputer readable storage medium of claim 14, wherein determining kaspects of the qualifier comprises clustering the first search term andqualifier.
 16. The computer readable storage medium of claim 15, whereindetermining k aspects of the qualifier further comprises selecting fromclusters resulting from the clustering.
 17. The computer readablestorage medium of claim 15, wherein determining k aspects of thequalifier further comprises an inter cluster move of an aspect from afirst cluster to a second cluster.
 18. The computer readable storagemedium of claim 14, wherein determining k aspects of the qualifiercomprises applying modified star clustering.
 19. The computer readablestorage medium of claim 14, wherein determining k aspects of thequalifier comprises applying k means clustering.
 20. The computerreadable storage medium of claim 14, wherein the original querycomprises the first search term.